智能论文笔记

Hardness of Agnostically Learning Halfspaces from Worst-Case Lattice Problems

Stefan Tiegel

分类：机器学习 | (统计)机器学习

2022-07-28

我们显示了基于最坏情况的晶格问题（例如，近似多项式因子中的最短载体），在不当模型中学习不当学习的半空间的硬度。特别是，我们表明，在此假设下，没有有效的算法可以输出任何二元假设，不一定是半空间，即使最佳错误分类误差也一样小，即使最佳错误分类误差也一样，也比$ \ frac 1 2- \ epsilon $更好地实现错误分类误差。小为$ \ delta $。在这里，$ \ epsilon $可以小于尺寸中任何多项式的倒数，而$ \ delta $则小于$ \ mathrm {exp} \ left（ - \ omega \ left（\ log^{1-c}）（\ log^{1-c}（ d）\ right）\ right）$，其中$ 0 <c <1 $是任意常数，$ d $是尺寸。此问题的先前硬度结果[Daniely16]基于平均案例复杂性假设，特别是Feige随机3SAT假设的变体。我们的工作为基于最坏情况的复杂性假设提供了这个问题的第一个硬度。它的灵感来自最近的一系列作品，显示出基于最坏情况的晶格问题学习良好的高斯混合物的硬度。

translated by 谷歌翻译

Fast algorithm for overcomplete order-3 tensor decomposition

Jingqiu Ding , Tommaso d'Orsi , Chih-Hung Liu , Stefan Tiegel , David Steurer

分类：机器学习

2022-02-14

我们开发了第一个快速频谱算法，用于分解$ \ mathbb {r}^d $排名到$ o的随机三阶张量。我们的算法仅涉及简单的线性代数操作，并且可以在当前矩阵乘法时间下在时间$ o（d^{6.05}）$中恢复所有组件。在这项工作之前，只能通过方形的总和[MA，Shi，Steurer 2016]实现可比的保证。相反，快速算法[Hopkins，Schramm，Shi，Steurer 2016]只能分解排名最多的张量（D^{4/3}/\ text {polylog}（d））$。我们的算法结果取决于两种关键成分。将三阶张量的清洁提升到六阶张量，可以用张量网络的语言表示。将张量网络仔细分解为一系列矩形矩阵乘法，这使我们能够快速实现该算法。

translated by 谷歌翻译

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

Tommaso d'Orsi , Chih-Hung Liu , Rajai Nasser , Gleb Novikov , David Steurer , Stefan Tiegel

分类：机器学习 | (统计)机器学习

2021-11-04

我们开发机器以设计有效的可计算和一致的估计，随着观察人数而达到零的估计误差，因为观察的次数增长，当面对可能损坏的答复，除了样本的所有品，除了每种量之外的ALL。作为具体示例，我们调查了两个问题：稀疏回归和主成分分析（PCA）。对于稀疏回归，我们实现了最佳样本大小的一致性$ n \ gtrsim（k \ log d）/ \ alpha ^ $和最佳错误率$ o（\ sqrt {（k \ log d）/（n \ cdot \ alpha ^ 2））$ N $是观察人数，$ D $是尺寸的数量，$ k $是参数矢量的稀疏性，允许在数量的数量中为逆多项式进行逆多项式样品。在此工作之前，已知估计是一致的，当Inliers $ \ Alpha $ IS $ O（1 / \ log \ log n）$，即使是（非球面）高斯设计矩阵时也是一致的。结果在弱设计假设下持有，并且在这种一般噪声存在下仅被D'Orsi等人最近以密集的设置（即一般线性回归）显示。 [DNS21]。在PCA的上下文中，我们在参数矩阵上的广泛尖端假设下获得最佳错误保证（通常用于矩阵完成）。以前的作品可以仅在假设下获得非琐碎的保证，即与最基于的测量噪声以$ n $（例如，具有方差1 / n ^ 2 $的高斯高斯）。为了设计我们的估算，我们用非平滑的普通方（如$ \ ell_1 $ norm或核规范）装备Huber丢失，并以一种新的方法来分析损失的新方法[DNS21]的方法[DNS21]。功能。我们的机器似乎很容易适用于各种估计问题。

translated by 谷歌翻译

Political representation bias in DBpedia and Wikidata as a challenge for downstream processing

Ozgur Karadeniz , Bettina Berendt , Sercan Kiyak , Stefan Mertens , Leen d'Haenens

分类：自然语言处理 | 人工智能

2022-12-29

Diversity Searcher is a tool originally developed to help analyse diversity in news media texts. It relies on a form of automated content analysis and thus rests on prior assumptions and depends on certain design choices related to diversity and fairness. One such design choice is the external knowledge source(s) used. In this article, we discuss implications that these sources can have on the results of content analysis. We compare two data sources that Diversity Searcher has worked with - DBpedia and Wikidata - with respect to their ontological coverage and diversity, and describe implications for the resulting analyses of text corpora. We describe a case study of the relative over- or under-representation of Belgian political parties between 1990 and 2020 in the English-language DBpedia, the Dutch-language DBpedia, and Wikidata, and highlight the many decisions needed with regard to the design of this data analysis and the assumptions behind it, as well as implications from the results. In particular, we came across a staggering over-representation of the political right in the English-language DBpedia.

translated by 谷歌翻译

Explainable AI for Bioinformatics: Methods, Tools, and Applications

Md. Rezaul Karim , Tanhim Islam , Oya Beyan , Christoph Lange , Michael Cochez , Dietrich Rebholz-Schuhmann , Stefan Decker

分类：人工智能 | 机器学习

2022-12-25

Artificial intelligence(AI) systems based on deep neural networks (DNNs) and machine learning (ML) algorithms are increasingly used to solve critical problems in bioinformatics, biomedical informatics, and precision medicine. However, complex DNN or ML models that are unavoidably opaque and perceived as black-box methods, may not be able to explain why and how they make certain decisions. Such black-box models are difficult to comprehend not only for targeted users and decision-makers but also for AI developers. Besides, in sensitive areas like healthcare, explainability and accountability are not only desirable properties of AI but also legal requirements -- especially when AI may have significant impacts on human lives. Explainable artificial intelligence (XAI) is an emerging field that aims to mitigate the opaqueness of black-box models and make it possible to interpret how AI systems make their decisions with transparency. An interpretable ML model can explain how it makes predictions and which factors affect the model's outcomes. The majority of state-of-the-art interpretable ML methods have been developed in a domain-agnostic way and originate from computer vision, automated reasoning, or even statistics. Many of these methods cannot be directly applied to bioinformatics problems, without prior customization, extension, and domain adoption. In this paper, we discuss the importance of explainability with a focus on bioinformatics. We analyse and comprehensively overview of model-specific and model-agnostic interpretable ML methods and tools. Via several case studies covering bioimaging, cancer genomics, and biomedical text mining, we show how bioinformatics research could benefit from XAI methods and how they could help improve decision fairness.

translated by 谷歌翻译

Reconstructing Kernel-based Machine Learning Force Fields with Super-linear Convergence

Stefan Blücher , Klaus-Robert Müller , Stefan Chmiela

分类：机器学习 | (统计)机器学习

2022-12-24

Kernel machines have sustained continuous progress in the field of quantum chemistry. In particular, they have proven to be successful in the low-data regime of force field reconstruction. This is because many physical invariances and symmetries can be incorporated into the kernel function to compensate for much larger datasets. So far, the scalability of this approach has however been hindered by its cubical runtime in the number of training points. While it is known, that iterative Krylov subspace solvers can overcome these burdens, they crucially rely on effective preconditioners, which are elusive in practice. Practical preconditioners need to be computationally efficient and numerically robust at the same time. Here, we consider the broad class of Nystr\"om-type methods to construct preconditioners based on successively more sophisticated low-rank approximations of the original kernel matrix, each of which provides a different set of computational trade-offs. All considered methods estimate the relevant subspace spanned by the kernel matrix columns using different strategies to identify a representative set of inducing points. Our comprehensive study covers the full spectrum of approaches, starting from naive random sampling to leverage score estimates and incomplete Cholesky factorizations, up to exact SVD decompositions.

translated by 谷歌翻译

Automatically Annotating Indoor Images with CAD Models via RGB-D Scans

Stefan Ainetter , Sinisa Stekovic , Friedrich Fraundorfer , Vincent Lepetit

分类：计算机视觉

2022-12-22

We present an automatic method for annotating images of indoor scenes with the CAD models of the objects by relying on RGB-D scans. Through a visual evaluation by 3D experts, we show that our method retrieves annotations that are at least as accurate as manual annotations, and can thus be used as ground truth without the burden of manually annotating 3D data. We do this using an analysis-by-synthesis approach, which compares renderings of the CAD models with the captured scene. We introduce a 'cloning procedure' that identifies objects that have the same geometry, to annotate these objects with the same CAD models. This allows us to obtain complete annotations for the ScanNet dataset and the recent ARKitScenes dataset.

translated by 谷歌翻译

ECG-Based Electrolyte Prediction: Evaluating Regression and Probabilistic Methods

Philipp Von Bachmann , Daniel Gedon , Fredrik K. Gustafsson , Antônio H. Ribeiro , Erik Lampa , Stefan Gustafsson , Johan Sundström , Thomas B. Schön

分类：计算机视觉 | 机器学习

2022-12-21

Objective: Imbalances of the electrolyte concentration levels in the body can lead to catastrophic consequences, but accurate and accessible measurements could improve patient outcomes. While blood tests provide accurate measurements, they are invasive and the laboratory analysis can be slow or inaccessible. In contrast, an electrocardiogram (ECG) is a widely adopted tool which is quick and simple to acquire. However, the problem of estimating continuous electrolyte concentrations directly from ECGs is not well-studied. We therefore investigate if regression methods can be used for accurate ECG-based prediction of electrolyte concentrations. Methods: We explore the use of deep neural networks (DNNs) for this task. We analyze the regression performance across four electrolytes, utilizing a novel dataset containing over 290000 ECGs. For improved understanding, we also study the full spectrum from continuous predictions to binary classification of extreme concentration levels. To enhance clinical usefulness, we finally extend to a probabilistic regression approach and evaluate different uncertainty estimates. Results: We find that the performance varies significantly between different electrolytes, which is clinically justified in the interplay of electrolytes and their manifestation in the ECG. We also compare the regression accuracy with that of traditional machine learning models, demonstrating superior performance of DNNs. Conclusion: Discretization can lead to good classification performance, but does not help solve the original problem of predicting continuous concentration levels. While probabilistic regression demonstrates potential practical usefulness, the uncertainty estimates are not particularly well-calibrated. Significance: Our study is a first step towards accurate and reliable ECG-based prediction of electrolyte concentration levels.

translated by 谷歌翻译

Lessons from Robot-Assisted Disaster Response Deployments by the German Rescue Robotics Center Task Force

Hartmut Surmann , Ivana Kruijff-Korbayova , Kevin Daun , Marius Schnaubelt , Oskar von Stryk , Manuel Patchou , Stefan Boecker , Christian Wietfeld , Jan Quenzel , Daniel Schleich

分类：机器人

2022-12-19

Earthquakes, fire, and floods often cause structural collapses of buildings. The inspection of damaged buildings poses a high risk for emergency forces or is even impossible, though. We present three recent selected missions of the Robotics Task Force of the German Rescue Robotics Center, where both ground and aerial robots were used to explore destroyed buildings. We describe and reflect the missions as well as the lessons learned that have resulted from them. In order to make robots from research laboratories fit for real operations, realistic test environments were set up for outdoor and indoor use and tested in regular exercises by researchers and emergency forces. Based on this experience, the robots and their control software were significantly improved. Furthermore, top teams of researchers and first responders were formed, each with realistic assessments of the operational and practical suitability of robotic systems.

translated by 谷歌翻译

Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN

Lei Li , Tianfang Zhang , Stefan Oehmcke , Fabian Gieseke , Christian Igel

分类：计算机视觉

2022-12-18

Fine-grained semantic segmentation of a person's face and head, including facial parts and head components, has progressed a great deal in recent years. However, it remains a challenging task, whereby considering ambiguous occlusions and large pose variations are particularly difficult. To overcome these difficulties, we propose a novel framework termed Mask-FPAN. It uses a de-occlusion module that learns to parse occluded faces in a semi-supervised way. In particular, face landmark localization, face occlusionstimations, and detected head poses are taken into account. A 3D morphable face model combined with the UV GAN improves the robustness of 2D face parsing. In addition, we introduce two new datasets named FaceOccMask-HQ and CelebAMaskOcc-HQ for face paring work. The proposed Mask-FPAN framework addresses the face parsing problem in the wild and shows significant performance improvements with MIOU from 0.7353 to 0.9013 compared to the state-of-the-art on challenging face datasets.

translated by 谷歌翻译